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Abstract 

The most popular approach in extreme value statistics is the modelling of threshold 
exceedances using the asymptotically motivated generalised Pareto distribution. This ap- 
proach involves the selection of a high threshold above which the model fits the data well. 
Sometimes, few observations of a measurement process might be recorded in applications and 
so selecting a high quantile of the sample as the threshold leads to almost no exceedances. 
In this paper we propose extensions of the generalised Pareto distribution that incorporate 
an additional shape parameter while keeping the tail behaviour unaffected. The inclusion of 
this parameter offers additional structure for the main body of the distribution, improves the 
stability of the modified scale, tail index and return level estimates to threshold choice and 
allows a lower threshold to be selected. We illustrate the benefits of the proposed models 
with a simulation study and two case studies. 

Keywords: extreme value theory; extended generalised Pareto distribution; tail estimation; 
threshold selection; liver toxicity 



1 Introduction 



The area of extreme value theory focuses on the study and development of stochastic models 
that can be used for inference on applied problems related to the frequency of very big (or very 
small) values in random experiments. One such widely used model is the generalised Pareto 
(GP) distribution defined by its distribution function 



F{x;X) = 1- (l + Cx/a),^^^ , x > 0, 



(1) 



where A = {a, ^) is a vector of parameters in (0, oo) x (— oo, oo) and z+ = max(2:, 0). Consider a 
random variable X arising from an absolutely continuous distribution function Fx and let also 



x^^ = snp{x : Fx{x) < 1} be the upper end point of Fx- Pickands (1975) shows that if there 
exists a scaling function hx (u) : M — )• M+ , u < x^^ , such that the scaled excess random variable 
{{X — u)/hx{u)}\X > u converges in distribution to a non-degenerate limit as u 
this is necessarily of the same type as the GP distribution, i.e.. 
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X >u} =F{x-Xo), x>0, Ao = (l,e). 



then 



(2) 



hx{u) 

Without loss of generality, the scaling function hx can be defined by the reciprocal hazard 

; a necessary and 
^, meaning that 



function of X, i.e., hx{u) = {1 — Fx{u)} / F^{u). Pickands (1986) shows that a necessary and 
sufficient condition for twice differentiable convergence is lim^^^Fx h'-^[u) 
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not only limit ^ holds but the corresponding densities and derivatives of densities converge. 
The parameter ^ is most commonly referred to as the shape parameter or the tail index of the 
distribution and we adopt the latter since we use the word shape for a different characteristic in 
the paper. The sign of the tail index ^ indicates the decay of the tail of Fx- ^ > means that 
Fx has a heavy-tailed distribution, ^ — )• corresponds to exponential decay and < means 
that Fx has finite upper end point, i.e., x^^ < oo. 



Consider a sequence of independent and identically distributed measurements {xi, . . . , x„} aris- 
ing from the random variable X. Standard practice in applications where we want to esti- 



mate extreme quantiles of the underlying distribution is to follow the approach of Davison 



and Smith (1990) and assume that the limit relationship ^ holds exactly for some threshold 
u < maxj=i^...^„(xi); i.e., X — u\X > u has distribution function where the function 

hx{u) is absorbed in the distribution function as a threshold dependent scale parameter 
Uu > 0. Extreme events are defined by the threshold exceedances {xi : Xi > u} and sub- 
sequently, the GP distribution is fitted to the random sample of excesses {xi — u : Xi > u} 
using maximum likelihood techniques. The fitted model is then extrapolated to levels above 
which no data are observed. Two central assumptions are imposed when using this procedure 
in practice. One is that the asymptotic argument of equation ^ is valid for the distribution 
function of the data under study and second is that an appropriate threshold u can be found 
such that the GP model provides a good approximation to exceedances of u. For a number of 
reasons such as the cost and time of collecting data, few observations of a measurement process 
might be recorded in applications. Since u < maxj=i^...„(xj) and n is small, Fx{u) <C 1 and so 
limit ([2]) is likely to be a poor approximation to the distribution of exceedances of u in such cases. 

Figure [1] shows such an example of residual bilirubin data collected from a clinical study of 606 
patients who were randomized to 4 doses of a drug, the highest dose of which is considered 
to have potential for liver toxicity. Data were available prior to treatment (baseline visit) and 
after 6 weeks of treatment (postbaseline) . The residual bilirubin observations are the residuals 



of linear median regression models of postbaseline on the baseline in each dose, see also South- 



worth and Heffernan (2010) for a similar analysis. 



The 4 different doses are coded here with increasing dose as A, B, C and D. The published 
literature reports jaundice, hepatitis and similar symptoms in approximately 1 out of 500 pa- 



tients taking the dose D of this drug, see Southworth and Heffernan (2010). According to 



FDA (2008) joint occurrence of extremes of the total bilirubin and aminotransferase laboratory 



variables are indicative of drug induced liver toxicity. Therefore, proper statistical modelling 
of their extremes is vital for assessing the liver toxicity of a new drug. However, the limited 
amount of information in each dose illustrates the problems of relying on the GP distribution. 
We will return to the analysis of these data in § |4.2[ 



The issue of specifying an appropriate threshold for fitting the GP distribution constitutes the 



major problem of the Davison and Smith ( 1990 ) approach. Typically, a threshold u is chosen as 



the lowest possible value above which the estimates of the tail index ^ and the modified scale 



a* = Uu — stabilise (see Coles, 2001 ). Departures from the GP distribution again imply that 



such a threshold might not be observable. Moreover, the higher the threshold the larger the 
sampling variability of the estimates of the modified scale and tail index parameters which leads 
to estimates being unstable over different thresholds. A variety of methods have been developed 



in the literature to address the problem of departures from the GP model assumption. Peng| 



(1998), Feuerverger and Hall (1999) and Beirlant et al. (1999) among others, use second order 



50 100 150 

Patient 



50 100 
Patient 



• • • • • JL^ • 

^ X*.; 



.* . • . ..• :. 



Figure 1: Residual total bilirubin variable measured from 606 patients at four different doses 
A, B, C and D. 



refined models to select the threshold and proceed with the estimation of tail characteristics by 



using the GP distribution. Beirlant et al. (2009) also use a second order approach for the mod- 
elling of the tail probability as well as for the extrapolation. An even more unsettling feature 
arises in cases where two or more seemingly plausible thresholds yield significantly different 
estimates of extreme quantities of interest. In such circumstances, the threshold selection is 



liable to be the subjective choice of the practitioner. Frigessi et al. (2002) propose unsupervised 



tail estimation with the use of a dynamic mixture model aimed for the entire distribution of the 



data. Tancredi et al. (2006) take into account the threshold uncertainty by allowing the thresh- 



old to be estimated as a statistical parameter in a mixture model. MacDonald et al. (2011 ) also 



use a mixture model where the non-extreme part of the data is estimated non-parametrically. 



Wadsworth and Tawn (2011) exploit penultimate theory to model threshold uncertainty and 



provide a likelihood ratio testing procedure for the threshold selection. 



All of the aforementioned approaches are either based on second order asymptotic arguments 
adding a little extra flexibility to the fit of the GP distribution or they model the entire dis- 
tribution, i.e., model the body as well as the tail. Furthermore, some of these approaches are 
confined to the heavy-tailed case > (Beirlant et al. , 2009). Our goal in this article is to 



construct parametric models for exceedances over thresholds that are more flexible than the 
GP distribution and add further insight on the threshold selection problem. For this purpose, 
in § 2.3 we construct a new class of probability models with distribution function G{x] k,\), 



where k > is a shape parameter, that generalises the GP distribution in the sense that there 
exists a K* > such that G(x;k*,A) = F{x;X). This parameter k offers additional structure 
by inducing skewness to the GP distribution while retaining ^ G M as the tail index. Thus, a 
class of departures from the GP assumption of limit ^ are captured by this shape parameter 
K. We will show that the inclusion of k improves the stability of the estimates of the tail in- 



dex and modified scale parameter and allows a lower threshold to be selected. Consequently, 
extrapolations based on different thresholds are more stable than those obtained from the GP 
distribution, a feature which makes the choice of threshold less important. 

In §[2] we present three extensions of the GP distribution and derive a characterisation of a class 
of models which includes these examples. The new models are given along with the description 
of the methodology implemented for the analysis of threshold exceedances. A statistical test 
aiding threshold selection is also illustrated. The effect of the new probability models on the 
statistical analysis of extremes is assessed with a simulation study in § [3| Finally, in § [4] we 
illustrate the benefits of the new models through the analysis of extreme flow data of the River 
Nidd, a dataset with known difficulties in threshold selection, and the clinical trial data in 
Figure [l} 

2 Theory and Models 

2.1 Notation and motivation 

Here and throughout, we denote by /3(-;a,6) and 7(-;'^) the regularised incomplete beta and 
regularised lower incomplete gamma functions given by 

a, b) = — ^- / ^''-^(l - 1)'-^^, < x < I, 
Be{a,b) Jq 

7(2/; a) = -^ r t^-^e-'dt, < y < 00, 

r(a) Jo 

with a,b> 0. We also denote by /3~^{-,a,b) and 7~^(-,a) their corresponding inverses. 

Our examples are motivated by transformations of the form W := F~^{V; A), where F is the 
GP distribution function given by equation ([T]) and ^ is a random variable with support the 
unit interval I = [0, 1]. The distribution function and density function of W are denoted by G 
and g, respectively. If 1/ ~ uniform(0,l) distribution then W ~ GP(o", ^) but if 1/ is on I with 
a distribution that contains the uniform(0, 1) as a special case then more flexible distributions 
than the GP are produced for W. 

2.2 Probability integral transform and new models 

Let riy, Qy be two sample spaces. Define Y : Qy — )• M to be a random variable with con- 
tinuous probability density (distribution) function /y(a;;T7) {FY{x;r])) parametrised over an 
m-dimensional vector of parameters 17 G H C M™. Let also V : ^ly — )• M be a random variable 
with continuous probability density (distribution) function fv{v]6) {Fv{v;6)) parametrised 
over a d-dimensional vector of parameters S C M"^, with fy(y;6) = 0, if ^ [0)1]- We 
also assume the existence of a 0* G such that fv{v;6*) = 1, \/v £ [0, 1], i.e., a special case 
of V follows the uniform(0,l) distribution. Then, the distribution and density functions of the 
transformed random variable FY^{V;ri) are given by 

K{x;rj,e) = Fr {FyHv-v) < x} = Fv {FY{x;r]y,e} , (3) 

k{x; T7, e) = K'{x; 6, rj) = fy {Fy (x; r?); 0} /y(x; r,). (4) 

Therefore the distribution function K{x; ij, 0) is a generalised distribution function for Fy{x; rf) 
in the sense that K{x;r],6*) = Fy{x]Vi), i.e., the distribution function FY{x;r]) is a special 



4 



case of K{x; t], 6). 



Equations Q and |4] provide the basis of all subsequent generalizations we propose. The distribu- 
tion function of V can be constructed in several different ways, one of which is the composition 
of a distribution function Li(-;ip), ■?/' G ^' C R"^"™, dim{ip) = d — m, d > 2m, with the inverse 
of a distribution function LQ{-;rj), ?7 G H C M™, dim(T7) = m, where Lq and Li are defined 
on the same support and Lq is a special case of Li (d > 2m is required for rj to have lower 
dimension than ■0), that is 



Fv{v;e) 



V <0, 
Li{Lo'{v;r,);ip} 0<v<l, 

1 v>l, 



(5) 



where G C M'^ consists of elements taken from the combined vector {ip,ri). When i/) 
and rj have elements in common, the combined vector is interpreted as the vector consisting 
of the unique elements of {ip,T]) that span 0. Note here that 6 is not necessarily equal to 
the combined vector rj) since common elements of ip and rj, if any, are allowed to can- 
cel in composition ([s]). For instance, when Li and Lq are the distribution functions of the 
gamma(K, cr) and exponential(c7) random variables, k > 0, o" > 0, i.e., Li{x;ip) = '){x/a,n) 
and LQ{x;r]) = 1 — exp{— x/cj}, for x > 0, then equation ([s]) yields the distribution function 
Fv{v; 0) = 7{— log(l — v), k}, Q <v <\. Here {■?/>, r/, 6} = {(k, ct), a, k} and {d, m, dim(0)} = 
{3,1,1}. Moreover, this distribution function reduces to the uniform(0, 1) distribution when 
K = 1, i.e., e* = 1. 



Below we present three new probability density functions that are generalisations of the GP 
density and can be obtained by transformations of the form Fy^iV; rj), where Fy = F, rj = \ 
and y is a random variable that satisfies equation ^ . Owing to the fact that each model extends 
the GP distribution in a parametric fashion, we refer to the new models as the extended GP 
(EGP) models and denote their density function by g{x; A, 0), for a; > 0. 

Example 1 Let Fv{v]0) = /3{l- (1 - k, 6 = (k,^) G (0,oo) x (-(X),cx)). Then 

the transformed random variable F^^iV'^X) has probability density function given by 



g{x;\e) = < 



(6) 



I r(K)- 



Example 2 Let Fv{v;0) = 1 — 7{— log(l — v),k}, 9 = k € (0,oo). Then the transformed 
random variable F^^(V; A) has probability density function given by 



a 



-1 



-i/C-i 



g{x;X,e) 



^ ^ 0. 



(7) 
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Example 3 Let Fyiv) = v'^, 6 = n £ (0, oo). Then the transformed random variable 
F~^[V; A) has probabihty density function given by 



g{x;X,e) 



' ^ {l - (1 + ^x/a)-'/^y (1 + ^x/a)-'/^-' e / 0, 



1 - e 



'X/cr 



-x/a 



(8) 



We write W ~ EGP1(k,o-,0, W ~ EGP2(k,(j,C) and W ~ EGP3(k,ct,0 when the density of 
a random variable W is given by expression ([6]), ([T]) and ([s]) respectively. In all examples in 
addition to the GP parameters a and ^ there is a shape parameter k > that adds more flexi- 
bility in the main body of the density and does not alter its tail behaviour, i.e., all distributions 
have tail index ^ G M. 



All models reduce to the GP density when k = 1. More specifically, the EGPl model can 



be viewed as an extended Snedecor's F^^^^^i distribution (Abramowitz and Stegun, 1965) with 



parameters vi > and 1^2 S M, that allows for negative 1^2 giving finite upper bound for this 
distribution. When ^ > and k = a, the density reduces to the Fl,-^^^^^ distribution with z^i = 2k 
and 1^2 = 2/^. Additionally for ^ > 0, the EGPl model is a well used loss distribution in actuarial 
science known in that literature as the generalised Fareto distribution (iHogg and Klugman 1984 



Klugman et al. , 2008). The EGPl model is an extension of this loss distribution for the case 



< 0. The EGP2 model can be viewed as a model that generalises the GP density in a similar 
way to the gamma generalising the exponential distribution. Specifically, the GP distribution 
is the distribution of the random variable (e^^ — 1)(t/C, where Y follows the exponential(l) 



distribution (Hosking and Wallis, 1987). Analogously, the EGP2 model is the distribution of 
the random variable (e^'^ — 1)<t/^, where Z follows the gamma(K,l) distribution. Finally, the 
EGP3 distribution function is simply obtained by raising the GP distribution function F{x; A) 
to a power k > 0. 



2.3 Construction of extreme value models 



Expression ([sj) represents a class of distribution functions Fy. However, unlike the extended 



models of > 



2.2 



the transformation Fy^iV; A) does not always ensure that the resulting random 
variable has a tail index ^ for all values of 9. One such example can be obtained by taking Fy = 
Li o Lq^ , with Li and Lq being the distribution functions of Weibull(«;, a) and exponential(cj) 
random variables, k, a > 0, i.e., Li{x) = 1 — exp {— (x/cj)''}. In this case, the transformed 
variable F~^{V; X), has survival function G given by 

Gix; n, A) = exp [- {C^ log (1 + Cx/a)^y] , x>0 

which is a slowly varying function at 00 for k < 1, ^ > and is therefore considered to be a 
'super-heavy-tailed' distribution under this combination of parameters which means that the 
parameter is no longer the tail index of this distribution. Hence, we proceed b characterising 
in Theorem [1] the class of distribution functions Fy under the assumption that FY^(y;r)) has 
tail index G M for all values of 6. 

Theorem 1 Let d > 2m where d, m E N and consider the parameter vectors (r/,i/',0) E (H x 



^' X 9) C (I 



) with dim(r/) = m and dim{'ip) = d 



m. 



Let Fy{x; rj) be a 



twice dijjerentiable distribution function of a random variable Y admitting a density function 
fyix, v)- ^^'So V be a random variable with twice differentiable distribution function Fy{v] 6) 
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so that its density function satisfies fv{v] 0) = when v ^ [0, 1]. Then the transformed random 
variable -Fy^(y;r/) has tail index G M., if and only if, the distribution function of V can be 
represented by 



Fv{v;e) = l-exp|-^ 



dz 



UY{Fy\zy,ri}Ciz-ri,tp) 



0<v<l, 



(9) 



where C{z;rj,il)) = J[s{z;ri,'tp)/ fY{FY^{z);r]}]dz and s is a real-valued function with 
limz-,1 s{z;r],ip) = I and /g^ [^/y{Fy ^(z); 77}C(z; 77, i/?)] ^ dz = 00. 

A proof is given in Appendix. Theorem [T] gives the characterization of the class of distribution 
functions Fy from which can be constructed a new class of models, FY^{V;r]), that have tail 
index G M. Under the assumption of Fy = F and 77 = A, i.e., the case in the examples of 
§ 2.1, we obtain the following. 

Corollary 1 Let V be a random variable as in Theorem^ Then F^^{V;X) has a tail index 
^ G M i/ and only if the distribution function of V is given by 



Fv{v; 6) = 1 — exp 



({1-z 



,1+5 f s{z]\,xIj) 



dz 



-1 



dz 



< < 1, 



(10) 



(1 

where s is a real-valued function with lim^-i.! s(z; A, = 1 and /o^[^(l— 2)"'^^'' / ('/^'^ i+g dz\ ~^dz 



00. 



Any real- valued function s with the specific properties of Theorem [T] would give rise to a valid 
distribution function Fy- As an example, consider the real-valued function s{z;X,ip) where 
ip = K £ {0, 00), given by 

s{z; A, -0) = z-^ {z"" + K - 1 + z^+^e - ^(k + 0} / - 1)} , < z < 1. 

Then, equation ( (Tol ) yields 

(1 - z") 



Fv{v; 0) = 1 — exp 



<z'^-i(l-z)i+5 



+ c 



-1 



dz 



c G 



(11) 



When c = 0, the distribution function Fy{v] 9) = v'^ of Example 3 in § |2.2| is obtained. When 



c > and ^ > 0, equation (11) yields a valid distribution function Fy{v; 6). However, c has to 



be necessarily equal to for Fv{v; 0) to be a valid distribution function when ^ < 0. 
2.4 Penultimate approximations 

We have so far presented three examples from a general class of models that extends the GP 
distribution by incorporating additional parameters while preserving the tail index ^ G M. To 
characterise the deviation of the tail behaviour of the extended models from the GP distribution 



we examine the penultimate approximation of the tail index proposed by Smith (1987), i.e.. 



we examine the rate of convergence of the three extended models given in § 2.2 to the GP 



survival function in limit expression ([2]). Let be a random variable with twice differentiable 
distribution function G{x) and density function g[x). Denote also the reciprocal hazard function 
of W by hw{x) = [1 — G[x)\/ g[x). Smith (1987) shows that for each u and x > there exists 
y G [u, n -|- such that 



1 - G{u-\-xh{u)) 
1 - G{u) 



{l + h'{y)x}\ 



(12) 
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By virtue of expression ^ the scaled excess random variable {{W — u) / h{u)}\W > u converges 
in distribution to the GP distribution if lim„_5.^G h'{u) = ^ G M. This is one form of the von 
Mises condition which is a necessary and sufficient condition for the convergence of the scaled 
excess of any random variable, with twice differentiable distribution function, to the GP dis- 
tribution. Defining u„ = G^^(l — 1/n), the penultimate approximation to the tail index in 
equation (12) is given in terms of n by h'{un), as n — )• oo. Moreover, the rate of convergence to 
the GP distribution is given by 0{\h'{un) — 



Define A,,^ = {\^\/Be{K, l/|e|)} and let D^,^ = - \C\)A^,^ and E^,^ = \^\A^,^ for 

7^ 0. Table [T] shows the leading order terms from the penultimate approximations of the tail 
index for the EGP models. For ^ € [—1, 1], the EGP3 distribution admits the fastest rate of 
convergence whereas for ^ G [—1, l]'^ the EGPl distribution has the fastest rate of convergence 
among the extended models. Irrespective of the value of ^ the EGP2 distribution has the 
slowest rate of convergence. Explicitly, for ^ 7^ and ^ = 0, the rate of convergence for the 
EGPl, EGP2 and EGP3 distributions is of order {n-l«l^(«>°)n-2|?l^(«<o), (logn)-\n-i} and 
{(logn)^^, (logn)^^,n^^}, respectively. Here G A) denotes the indicator function which 
takes the value 1 when ^ G A and otherwise for any set yl G M. 



Table 1: Leading terms of threshold Un and penultimate approximations h'{un) for Examples 1- 



3 of S 2.2 



Model 



Ur, 



/i'K)(c^o) 



EGP2 {aji) {n«r(K)-« - 1} 
EGP3 {ali){{Knf -\) 



C + (logn)~H«:-l) 
^ + n-i(K-l)(^-l)/(2K) 



-(logn) ^(k - 1) 

-(logn)~^(K — 1) 
—n~^{^K — 1)/ K 



2.5 Statistics using extended GP Models 

We propose the use of the EGP models as alternatives to the GP distribution for the mod- 
elling of the excess random variable X — u\X > u. Specifically, given a random sample x we 
model the exceedances £c>„ = {xi : Xi > u} =: (xi, . . . with the EGP(k, cr, ^) family of 

distributions. Maximum likelihood is used to estimate the parameters (k,(T, ^), i.e., maximum 
likelihood estimates satisfy 

(k, a, C) = argmax 1{k, a, ^|a;>«) 

where S = (0,oo) x (0,oo) x (—00,00) and o", ^|a;>t() denotes the log-likelihood of the 
parameters given the observed sequence of excesses of length Uu = #{xi > u}, i.e., for ^ / 

r^^P^K, a, e|a=>.) = n„ log I ^^^'^'1^^,,^ } + - 1) E log [1 - {1 + ^(^^ - ^)/^}+ ^'^^ 

riu 

-(1/e + 1) ^ log {1 + C{xi - n)/(T}_, , 
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a 



+ log [r' log {1 + - u)/a}_ 



i=l 



n 

(i/e + i)^iog{i + e(xi-^z)M 



i=l 



a, ^|a^>„) = nu log {k/cj) + - 1) ^ log [l - {1 + i{xi - u)/a}^^'^ 

1=1 

+ 1) log {1 + - ^x)M+ . 

i=l 

Inference for extreme quantiles is made via the T-observation return level xt which is defined 
by the level that is exceeded on average once every T observations. The T-observation return 
level is the solution of Pr(X > xt) = 1/T. Under the assumption that the exceedances above 
a threshold u are well modelled by the EGP family of distributions and such that xt > u, the 
T-observation return level for ^ 7^ is given by 



Xrp 



EGPl 



u + 



a 
a 



|i_;3-i(i-(Tc„)-\K,icri)} 



1 



a;f,GP3 = u + 



{l-(l-(Ta)-i/'^)}"^-l 



where = Pr(X > u). Return level estimates are obtained by substituting the parameter 
values by their maximum likelihood estimates whereas standard errors and confidence intervals 
are derived by the delta method or from the profile likelihoods of the parameters. 

Aside from the model fitting of the exceedances with the EGP family of distributions, additional 
diagnostics for the GP distribution can be obtained. In particular, extra insight about the 
convergence in expression ([2]) can be sought from the EGP models by testing the statistical 
hypothesis 



Ho : {K,a,C) € So vs Hi : {K,a,^) £ Si, 



(13) 



where So = {(l,cr,0 : G M+,^ G M} and Si = {(k,cj,C) : k G M+ \{1},o- G M+,C G M}. Given 
the sample of excesses x^u, the generalised log- likelihood ratio test statistic reads 

An„(a3>«) = 2 [sup{^(K;,cr,^|a;>„) : G 5"} - sup ct, ^|a;>„) : {K,a,^) G 5o}] ,(14) 

where S = Sq U Si. From asymptotic likelihood theory as — )• 00, converges in distribu- 
tion to the chi-squared with 1 degree of freedom under Ho- Therefore, tests of the statistical 



hypothesis (13) can be made on the basis of the asymptotic distribution of A„^. Moreover, the 
limit expression ([2]) suggests that if the GP distribution is a reasonable model for the observed 
exceedances above a threshold u', then exceedances above a higher threshold u" > u' should 
also follow the GP distribution. This argument suggests plotting k against u and selecting the 
threshold as the lowest possible value at which k is not significantly different from 1 and the 
estimated modified scale and tail index are constant for all u" > u' . 
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3 Simulation Study 



We illustrate the impact of the extended models on the tail estimation using normal simulated 
data. All comparisons are based on the root mean square error (RMSE) performance of a range 
of estimated extreme quantiles using various sample sizes for the simulations. Specifically, for 
each distribution 10000 samples of size n = 100, 1000, 10000 were generated. The GP, EGPl 
and EGP2 distributions were fitted to the exceedances of each sample above a range of N 
equally spaced thresholds ui, un, with ui = <I>^-'^(l/n) and un = ^^^{1 — 30/n). Results ob- 
tained from the EGP3 model are not shown as they are similar to the EGPl and EGP2 models. 
This grid was chosen such that ui and um correspond approximately to the minimum possible 
threshold, i.e., all data are above ui, and uat is the threshold above which 30 data points are 
observed on average, respectively. At each threshold we computed Monte Carlo estimates of the 
RMSE of the T-observation return level estimate. For each sample size n used in the simulation 
study, we chose two different values of T, given by Ti/n = 1.5, 5, for i = 1,2, corresponding to 
short and long extrapolations. 

Figure [2] shows the RMSE output of the simulation study for the normal simulated data. Results 
illustrate improvement in inference using the EGP models over the GP model for both return 
level estimates and each sample size as the minimum RMSE is attained for the two EGP 
models, with their performance being almost indistinguishable at this value. More precisely, 
this improvement is largest in the small sample case (n = 100) where the optimal choice of the 
threshold according to the lowest RMSE is ui = <I'~^(1/{100}). This illustrates the advantage 
of fitting the EGP models to the whole data in small sample size cases instead of the GP 
distribution. For the Ti-observation return level in the n = 100 case, the EGP estimates yield 
higher bias and lower variance than the GP estimates whereas for any other combination of 
sample size and return level, the EGP estimates have lower bias and either slightly lower or 
higher variance at the threshold where the minimum RMSE occurs. From Table [2] we also have 
that as the sample size increases, the absolute difference of the corresponding optimal thresholds 
and RMSE of the EGP distributions and the GP distribution diminishes. This is an expected 
phenomenon which is justified by the validity of the asymptotics of extreme value theory as 
sample size increases. 

Table 2: Optimal thresholds for the estimation of the return levels for each sample size. 





Ti 


T2 




100 


1000 


10000 


100 


1000 


10000 


■"EGP 
MGP 


-2.32 
-0.33 


0.05 
0.45 


1.48 
1.70 


-2.32 
-0.23 


0.51 
0.68 


1.44 
1.44 



Figure [3] shows the Monte Carlo estimates as well as the estimated uncertainty of the shape 
parameter k from the EGP2 model plotted against the threshold for all sample sizes. Note also 
that the estimates obtained from the EGPl model are close to the EGP2 estimates and therefore 
are not shown here. All graphs illustrate the same feature, i.e., k stabilises around the value 
1 as the threshold u increases. Additionally, the minimum thresholds at which the value 1 is 
inside the sampling distribution of k are similar to the optimal thresholds of Table [2] for the GP 
model, denoted by uqp. This feature demonstrates the usefulness of this plot as an additional 
diagnostic for the GP modelling framework. The 95% pointwise confidence intervals are largest 
for small and large threshold values. This feature is explained by the greater dependence of 
parameters ^ and k at low threshold values (revealed by the profile likelihood plots of ^ and n 
that are not shown here) and the few data points at high threshold values. 
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Threshold 

Figure 2: RMSE for the Ti-(top row) and T2-observation (bottom row) return level estimates 
obtained from EGPl (solid black), EGP2 (dashed black) and GP (grey). Columns correspond 
to the three different sample sizes n = 100, 1000, 10000 (from left to right). 
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Figure 3: Monte Carlo estimates (median) of parameter k. (black dots) plotted against the 
threshold for each sample size. Grey-shaded areas correspond to the 95% pointwise equal tail 
confidence intervals. 
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4 Applications 



4.1 River Nidd Data 

We now analyse 154 exceedances of the threshold 65m^s~^ by the River Nidd at Hunsingore 



Weir from 1934 to 1969 taken from NERC (1975). This data set constitutes the best known 



example with apparent difficulties in threshold selection and the modelling of the tail using the 



GP distribution, studied previously by Hosking and Wallis (1987), Davison and Smith (1990), 
Tancredi et al. (2006) and |Wadsworth and Tawii (2011). Figure [4| shows the parameter stability 
plots from the EGPl (left) and GP (right) models over a grid of thresholds 65.08, 88.61 
along with the histogram of the data. Threshold selection from the GP model based on the 
stability of the tail index and modified scale parameters is not straightforward. In contrast, 
the tail index and modified scale estimates from the EGPl model appear to be stable over the 
plotted range of thresholds. Hence we select u = 65m'^s~^ (all data points) for the fit of the 
EGPl distribution. Moreover, the fact that (kegpii o"gQp^, .^egpi) stabilise to values around 



o 

H 



0, 
O 

<b 



LD 
O 



O 

d 



65 



70 



75 



80 



85 



65 



70 



75 



80 



85 



O 



Ln 
d 



o 
d 



















1 1 

65 70 


1 1 

75 80 


1 

85 




u 





0, 

<b 



65 



70 



75 



80 



85 



o 

H 



LD 



LD 

d 



65 



- - \ ... 



... •. ".4 • 



70 



75 



80 



85 



o _, 
d 



0) d 



o 
o -I 
d 




70 



130 



190 



250 



300 



u Flow rate 

Figure 4: Maximum likelihood estimates and 95% pointwise equal-tail confidence intervals of 
tail index, modified scale and shape parameter (^,0"*, n) based on asymptotic normality: EGPl 
(left column) , GP (right column) . Bottom right graph shows the histogram of the River Nidd 
data along with the estimated density (black solid line) from the EGPl model fitted to the 
exceedances above u = 65m^s~^. 



(1,-16,0.46) for the threshold values above 74 suggests that any threshold in this region is 
reasonable for the GP distribution. However, small deviations of kegpi from the value 1 in this 
threshold region seem to have an impact on the stability of the GP estimates and the lowest 
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threshold where k is very close to 1 is 75.3m^s ^. This finding is also consistent with that of the 
Wadsworth and Tawn (2011 ) approach where they choose the value of 75m^s~^. We thus select 



and u = 75.3m'^s ^ for the GP distribution. Note also that the tail index and modified scale 

estimates from the EGPl fitted above u = 65m^s~^ (0.44,-19) are similar to those obtained 

1 



from the GP fitted above u = 75.3m'^s~^ (0.48, -17). 



To assess the impact on extrapolation, we look at the stability of return level estimates with 
respect to the choice of the threshold. Figure [5] shows return level estimates obtained from the 
EGPl and GP models on the same grid of thresholds. Clearly, inference made on the basis of the 
EGPl model yields much more stable results in comparison with the GP model. Return level 
estimates obtained from the EGPl model gradually decrease with increasing threshold whereas 
estimates obtained from the GP model vary irregularly. This feature illustrates that the choice 
of threshold is less important for the Nidd data while using the EGP class of distributions. 
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Figure 5: Estimates of 10-, 50-, 100-, 200- and 500-observation return level obtained from fitted 
EGPl (left) and GP (right) models above the range of thresholds 65.08, . . . , 82.6 coded here 
by numbers 1,...,8 respectively. Numbers 1 and 5 correspond to thresholds 65.08 and 75.3 
respectively. 



4.2 Pharmaceutical Application 



We now return to the analysis of the residual bilirubin data shown in Figure [T| As already 
mentioned in § [T| the identification of liver toxic drugs is a multivariate extreme value problem 
in which the joint occurrence of extremes of residual bilirubin and other laboratory variables 
must be well modelled. However, as any multivariate extreme value analysis necessitates, the 



marginal extremes of these variables have to be modelled first. Southworth and Heffernan (2010) 



analysed the extremes of all laboratory variables taken from the same dataset with the GP mod- 



elling approach of Davison and Smith (1990), taking the threshold as the 70% quantile of the 
data. They found dose response relationships for all liver related laboratory variables other 
than residual bilirubin, justified by GP models with scale or tail index parameters linear in 
dose. Our primary objective in this analysis is to use the EGPl distribution of 



2.2 to model 



the extremes of the residual bilirubin and to test for relationship with dose. Using the EGP 
models of § 2.2 allows the inclusion of more data points which might reveal evidence of rela- 
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tionship between residual bilirubin and dose, missed by Southworth and Heffernan (2010). To 
assess the relationship of residual bilirubin with dose we use generalised likelihood ratio tests 
between models that have dose dependent parameters and models with the same parameters 
across doses. The practice of pooling parameters and more specifically of the tail index in the 



extreme value modelling framework can be found in various applications including Coles and 



Tawn (1990); Cooley et al. (2007) and Davison et al. (2011) to name but a few. 




~i 1 1 1 1 1 1 r~ 

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.1 




~i 1 1 1 1 1 1 r~ 

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.1 



Figure 6: Left: estimated shape parameter for dose levels A, B, C, D (black lines) and under 
the assumption of common shape across doses (dark grey line). Right: estimated tail index for 
dose levels A, B, C, D under common shape across doses (black lines) and under the assumption 
of common shape and common tail index across doses (dark grey). Light grey areas correspond 
to a set containing all 95% pointwise confidence intervals based on asymptotic normality for 
the parameter estimates shown with black lines. The set is constructed by the minima (lower 
boundary) and maxima (upper boundary) of the confidence intervals. 



Let — u\X^ > u he the excesses of the residual bilirubin variable over the threshold u at 
dose j = A, B,C, D. We initially fit the EGPl model to the excesses over thresholds ranging 
from -0.65 (1%) to 0.15 (77%) by allowing separate shape, scale and tail index parameters 
for each dose, i.e., — u\X^ > u ~ EGPl(Kj, cij, ^j), for dose j. The numbers in brack- 
ets are the corresponding sample quantiles of the combined data. The left plot of Figure |6] 
shows the maximum likelihood estimates PiA, ■ ■ ■ over the threshold values. A feature re- 
vealed from this graph is that the estimated shape parameters appear to be similar across 
the doses for thresholds greater than —0.51. This is also supported by the generalised like- 
lihood ratio test of the hypothesis Hq : {ka, ■ ■ ■ , kd) G Q vs i?i : {ka, ■ ■ ■ , kd) G Q^, where 
Q = {{ka, ■ . . , kd) G : KA = ■ ■ ■ = kd} and Q'^ is the complement of the set Q. Specifically, 
the generalised likelihood ratio test failed to reject the null hypothesis at all thresholds other 
than the threshold values below -0.51. Thus, we proceed to the analysis of the bilirubin data 
with the estimated common shapeparameter shown with the dark grey line in the left plot of 
Figure [oj The right plot of Figure [g] shows the maximum likelihood estimates Ca, - ■ ■ ,^0 under 
the assumption of common shape across doses. In this case, the generalised likelihood ratio test 
failed to reject the null hypothesis of common tail index over dose at all thresholds. We found 
that the simplest model selected by generalised likelihood ratio tests is with common shape, 
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scale and tail index parameters for all doses. We also found similar results regardless of the or- 
der according to which the pooling of parameters was conducted. This suggests that there is no 
evidence of relationship between the residual bilirubin and dose for all thresholds greater than 
-0.51, at the significance level of 5%. However, for thresholds below —0.51 there is evidence of a 
relationship with dose as indicated by the significant increase in the shape parameter estimate 
for dose D. This change indicates larger quantiles for dose D than for the other doses. 



Figure [7] shows the quantile-quantile plots for the EGPl and GP models with common shape, 
scale and tail index parameters among doses, fitted to the threshold exceedances above 0.10 
(30%) and -0.13 (70%), respectively. The parameter estimates obtained from the EGPl and GP 
fits are (kegpi, o-egpi, Iegpi) = (1.29,0.25,-0.24) and ((Tgp,^gp) = (0.21,-0.27) respectively. 
Their corresponding standard errors are (0.09,0.02,0.05) and (0.01,0.04). For the GP model 
we used Southworth and Heffernan (2010) choice of the 70% quantile which is consistent with 
the stability of the parameter estimates. For both models, the fit is good as the majority of the 
observed data points lie within the 95% pointwise tolerance intervals. 




Figure 7: Quantile-quantile plots to assess the fit to the exceedances of the EGPl (left) and GP 
(right) models. Dashed lines show the 95% pointwise tolerance intervals. 

The best fitting EGPl model has k significantly different from 1, and hence provides evidence of 
a departure from the GP distribution at the selected threshold. However, above the respective 
thresholds used to fit the two models there is no apparent difference in the quality of the fits. 
The finding of no evidence of a dose effect in the EGP models is identical to findings of the 
previous GP analysis. Despite this failure to identify a dose effect for thresholds above —0.51, we 
believe our analysis offers considerable benefits. Specifically, due to being able to substantially 
lower the threshold used relative to the GP analysis, larger sample sizes are used and thus the 
power of a test for dose effects in the residual bilirubin data is increased. 
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A Proof of Theorem 1 

Assume that Fy can be represented by equation ([9]). Let K{x;r],6) and k{x;r],6) be the 
distribution function and density function of the transformed variable W = Fy^iVirj). Dif- 
ferentiability of fy and fv implies that W will have tail index ^ € M if the derivative of the 
reciprocal hazard function of W, /i'^(a;; 77, 0) = d/dx [{1 — K{x;r],0)} /k{x;r],6)], equals ^ as 
X — )■ x^ = sup {x : K{x; 77, 6) < 1} (Von Mises' condition). We have 



is{FY{x);r},ip] 



as X — )• X 



K 



To prove the converse, we assume that the random variable W has tail index ^, i.e., 
lim^^^K h'Y/{x]'q,6) = ^. In other words, there exists a real-valued function s : M — )■ M with 
^vcn^^^Fy s{Fy(a;);T7, 0} = 1 such that h'y^{x]ri,6) = (x); 77, 0}. Writing hw{x;r],6) = 

/iy{Fy (x; r/); 0}//y (x; rj) we have 

h'yiFvix; 77); 6} - §^ V{Fy (x; 77); 6} = Cs{FY{x);'n, 6}. 

Jy (X, 77 j 

The solution of this first order linear differential equation is given by 

V{Fy(x;7?);6>} =e/y(x;77) j s{FY{x);v,0}dx, 
which is a separable differential equation with solution 

r-Fy(x;r,) dFYit;T]) \ 



Fv{Fy ix;ri)- 6} = l-exp 



10 CfYit;v)fs{FY{ty,ii,e}dt 
Under the change of variable z = FY^t; 77), we have 

dz 



Fviv;e) = 1 - exp 



(15) 



where v = Fy{x; 77). By assumption 9 is an at most d-dimensional vector of parameters. Hence 
equation ( 15 ) implies the existence of a (d— m)-dimensional vector of parameters t/j G ^' C R'^"'" 



such that expression (15) can be written as 



Fviv;e) = 1 - exp 



dz 



CMFyi(.);r7}/^ 



s{z;r),ip) 



dz 



and (t/j, 77) span 6. 
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